Pruning Strategies Based on the Upper Bound of Information Gain for Discriminative Subgraph Mining

نویسندگان

  • Kouzou Ohara
  • Masahiro Hara
  • Kiyoto Takabayashi
  • Hiroshi Motoda
  • Takashi Washio
چکیده

Given a set of graphs with class labels, discriminative subgraphs appearing therein are useful to construct a classification model. A graph mining technique called Chunkingless Graph-Based Induction (Cl-GBI) can find such discriminative subgraphs from graph structured data. But, it sometimes happens that Cl-GBI cannot extract subgraphs that are good enough to characterize the given data due to its time and space complexities. Thus, to improve its efficiency, we propose pruning methods based on the upper-bound of information gain that is used as a criterion for discriminability of subgraphs in Cl-GBI. The upper-bound of information gain of a subgraph is the maximal one that its super graph can achieve. By comparing the upper-bound of each subgraph with the best information gain at the moment, Cl-GBI can exclude unfruitful subgraphs from its search space. Furthermore, we experimentally evaluate the effectiveness of the pruning methods on a real world and artificial datasets.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Discriminative Feature Selection for Uncertain Graph Classification

Mining discriminative features for graph data has attracted much attention in recent years due to its important role in constructing graph classifiers, generating graph indices, etc. Most measurement of interestingness of discriminative subgraph features are defined on certain graphs, where the structure of graph objects are certain, and the binary edges within each graph represent the "presenc...

متن کامل

Semi-supervised Clustering of Graph Objects: A Subgraph Mining Approach

Semi-supervised clustering has recently received a lot of attention in the literature, which aims to improve the clustering performance with limited supervision. Most existing semi-supervised clustering studies assume that the data is represented in a vector space, e.g., text and relational data. When the data objects have complex structures, e.g., proteins and chemical compounds, those semi-su...

متن کامل

RP-growth: Top-k Mining of Relevant Patterns with Minimum Support Raising

One practical inconvenience in frequent pattern mining is that it often yields a flood of common or uninformative patterns, and thus we should carefully adjust the minimum support. To alleviate this inconvenience, based on FP-growth, this paper proposes RP-growth, an efficient algorithm for top-k mining of discriminative patterns which are highly relevant to the class of interest. RP-growth con...

متن کامل

Efficient Mining of Top-k Breaker Emerging Subgraph Patterns from Graph Datasets

This paper introduces a new type of discriminative subgraph pattern called breaker emerging subgraph pattern by introducing three constraints and two new concepts: base and breaker. A breaker emerging subgraph pattern consists of three subpatterns: a constrained emerging subgraph pattern, a set of bases and a set of breakers. An efficient approach is proposed for the discovery of top-k breaker ...

متن کامل

Cohesive Subgraph Mining on Attributed Graph

Finding cohesive subgraphs is a fundamental graph problem with a wide spectrum of applications. In this paper, we investigate this problem in the context of attributed graph, where each vertex is associated with content (e.g., geo-locations, tags and keywords). To properly capture the cohesiveness of the vertices in a subgraph from both graph structure and vertices attribute perspectives, we ad...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008